💰 Trillion Dollar Words in Julia

JuliaCon 2024

Patrick Altmeyer

Thursday, July 11, 2024

TrillionDollarWords.jl

Stable Dev Build Status codecov codecov 88% 88% Code Style: Blue

A light-weight package providing Julia users easy access to the Trillion Dollar Words dataset and model (Shah, Paturi, and Chava 2023).

Disclaimer

Please note that I am not the author of the Trillion Dollar Words paper nor am I affiliated with the authors. The package was developed as a by-product of our research and is not officially endorsed by the authors of the paper.

Context

Developed in the context of our position paper on LLM interpretability presented at ECONDAT 2024 and ICML 2024 (preprint, blog post, code):

Stop Making Unscientific AGI Performance Claims (Altmeyer et al. 2024)

  • Experiments: We probe models of varying complexity including random projections, matrix decompositions, deep autoencoders and transformers.
    • All of them successfully distill knowledge and yet none of them develop true understanding.
  • Social sciences review: Humans are prone to seek patterns and anthropomorphize.
  • Conclusion and outlook: More caution at the individual level, and different incentives at the institutional level.

Basic Functionality

The package provides the following functionality:

  • Load pre-processed data.
  • Load the model proposed in the paper.
  • Basic model inference: compute forward passes and layer-wise activations.
  • Download pre-computed activations for probing the model.

Loading the Data

Sentences

40,000 time-stamped sentences from

  • meeting minutes
  • press conferences
  • speeches

by members of the Federal Open Market Committee (FOMC):

using TrillionDollarWords
load_all_sentences() |>
  x -> names(x)
8-element Vector{String}:
 "sentence_id"
 "doc_id"
 "date"
 "event_type"
 "label"
 "sentence"
 "score"
 "speaker"

All Data

Merged data includes economic indicators

  • Consumer Price Index (CPI)
  • Producer Price Index (PPI)
  • US Treasury (UST) yields
load_all_data() |>
  x -> names(x)
11-element Vector{String}:
 "sentence_id"
 "doc_id"
 "date"
 "event_type"
 "label"
 "sentence"
 "score"
 "speaker"
 "value"
 "indicator"
 "maturity"

Loading the Model

  • Can be loaded with or without the classifier head.
  • Uses Transformers.jl to retrieve the model from HuggingFace.
  • Any keyword arguments accepted by Transformers.HuggingFace.HGFConfig can also be passed.
load_model(; load_head=false, output_hidden_states=true)

Basic Model Inference

From Scratch

Layer-wise activations can be computed as follows:

df = load_all_sentences()
mod = load_model(
  load_head=false, 
  output_hidden_states=true
)
n = 5
queries = df[1:n, :]
layerwise_activations(
  mod, queries
) 

From Artifacts

We have archived activations for each layer and sentence as artifacts:

using LazyArtifacts

artifact"activations_layer_24"

OK, but why would I need all this? 🤔

“There! It’s sentient!”

Motivation

  • : „It is essential to bring inflation back to target to avoid drifting into deflation territory.“
  • : „It is essential to bring the numbers of doves back to target to avoid drifting into dovelation territory.“

Motivation

  • : „It is essential to bring inflation back to target to avoid drifting into deflation territory.“
  • : „It is essential to bring the numbers of doves back to target to avoid drifting into dovelation territory.“

“They’re exactly the same.”

— Linear probe

Embedding FOMC comms

  • We linearly probe all layers to predict unseen economic indicators (CPI, PPI, UST yields).
  • Predictive power increases with layer depth and probes outperform simple AR( ) models.
Figure 1: Out-of-sample root mean squared error (RMSE) for the linear probe plotted against FOMC-RoBERTa’s n-th layer for different indicators.

Sparks of Economic Understanding?

If probe results were indicative of some intrinsic ‘understanding’ of the economy, then the probe should not be sensitive to random sentences unrelated to economics.

Figure 2: Probe predictions for sentences about inflation of prices (IP), deflation of prices (DP), inflation of birds (IB) and deflation of birds (DB). The vertical axis shows predicted inflation levels subtracted by the average predicted value of the probe for random noise.

Intended Purpose and Goals

Good starting point for the following ideas:

  • Fine-tune additional models on the classification task or other tasks of interest.
  • Further model probing, e.g. using other market indicators not discussed in the original paper.
  • Improve and extend the label annotations.

Any contributions are very much welcome.

Questions?

With thanks to my co-authors Andrew M. Demetriou, Antony Bartlett, and Cynthia C. S. Liem and to the audience for their attention.

References

Altmeyer, Patrick, Andrew M. Demetriou, Antony Bartlett, and Cynthia C. S. Liem. 2024. “Position: Stop Making Unscientific AGI Performance Claims.” https://arxiv.org/abs/2402.03962.
Shah, Agam, Suvan Paturi, and Sudheer Chava. 2023. Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis.” arXiv Preprint arXiv:2310.02207v1. https://arxiv.org/abs/2305.07972.

Image sources

  • Leonardo DiCaprio: Meme template by user on Reddit

Quote sources

  • “There! It’s sentient”—that engineer at Google (probably!)